Machine Learning Final Project: Handwritten Sanskrit Recognition using a Multi-class SVM with K-NN Guidance
نویسندگان
چکیده
We develop an optical character recognition (OCR) engine for handwritten Sanskrit using a two-stage classifier. Inside the standard OCR pipeline, we focus on the classification problem assuming characters have been preprocessed decently. One challenge we face is that the language of Sanskrit has about a hundred core characters where model driven methods, like Support Vector Machine (SVM), have to search in the exponentially growth of the combinatoric model space during training, while data driven methods, like k nearest neighbor (kNN), becomes costly in computation during testing. To address this challenge, we propose a two-stage multiclassifier, using non-parametric to reduce the model space to search, and parametric models to relieve computation burden with better generalization. In the first stage, we apply kNN to coarsely assign the test data into the possible group of k classes, and a multiclassifier of k classes to label the sample in the second stage. Our method is fully automatic, highly accurate, and computational efficiently.
منابع مشابه
A Comparative Study of SVM Models for Learning Handwritten Arabic Characters
In order to select the best SVM model for a specific machine learning task, a comparative study of SVM models is presented in this paper. We investigate the case of learning handwritten Arabic characters and we make use of tabu search metaheuristic in order to scan a large space of SVM models including multi-class scheme (one-against-one or one-against-all), SVM kernel function and kernel param...
متن کاملHandwritten digit Recognition using Support Vector Machine
Handwritten Numeral recognition plays a vital role in postal automation services especially in countries like India where multiple languages and scripts are used Discrete Hidden Markov Model (HMM) and hybrid of Neural Network (NN) and HMM are popular methods in handwritten word recognition system. The hybrid system gives better recognition result due to better discrimination capability of the N...
متن کاملFourier Descriptor based Isolated Marathi Handwritten Numeral Recognition
Numeral recognition remains one of the most important problems in pattern recognition. To the best of our knowledge, little work has been done in Devnagari script compared with those for non Indian scripts like Latin, Chinese and Japanese. In this paper we propose an effective method for recognition of isolated Marathi handwritten numerals written in Devnagari script. Fourier Descriptors that d...
متن کاملHandwritten Devanagari Word Recognition: A Curvelet Transform Based Approach
Abstract— This paper presents a new offline handwritten Devanagari word recognition system. Though Devanagari is the script for Hindi, which is the official language of India, its character and word recognition pose great challenges due to large variety of symbols and their proximity in appearance. In order to extract features which can distinguish similar appearing words, we employ Curvelet Tr...
متن کاملOnline Handwritten Digit Recognition Using Gaussian Based Classifier
Discrete Hidden Markov Model (HMM) and hybrid of Neural Network (NN) and HMM are popular methods in handwritten word recognition system. The hybrid system gives better recognition result due to better discrimination capability of the NN. A major problem in handwriting recognition is the huge variability and distortions of patterns. Elastic models based on local observations and dynamic programm...
متن کامل